17 research outputs found

    Approximating ReLU on a Reduced Ring for Efficient MPC-based Private Inference

    Full text link
    Secure multi-party computation (MPC) allows users to offload machine learning inference on untrusted servers without having to share their privacy-sensitive data. Despite their strong security properties, MPC-based private inference has not been widely adopted in the real world due to their high communication overhead. When evaluating ReLU layers, MPC protocols incur a significant amount of communication between the parties, making the end-to-end execution time multiple orders slower than its non-private counterpart. This paper presents HummingBird, an MPC framework that reduces the ReLU communication overhead significantly by using only a subset of the bits to evaluate ReLU on a smaller ring. Based on theoretical analyses, HummingBird identifies bits in the secret share that are not crucial for accuracy and excludes them during ReLU evaluation to reduce communication. With its efficient search engine, HummingBird discards 87--91% of the bits during ReLU and still maintains high accuracy. On a real MPC setup involving multiple servers, HummingBird achieves on average 2.03--2.67x end-to-end speedup without introducing any errors, and up to 8.64x average speedup when some amount of accuracy degradation can be tolerated, due to its up to 8.76x communication reduction

    Intermittent Computing: Challenges and Opportunities

    Get PDF
    The maturation of energy-harvesting technology and ultra-low-power computer systems has led to the advent of intermittently-powered, batteryless devices that operate entirely using energy extracted from their environment. Intermittently operating devices present a rich vein of programming languages research challenges and the purpose of this paper is to illustrate these challenges to the PL research community. To provide depth, this paper includes a survey of the hardware and software design space of intermittent computing platforms. On the foundation of these research challenges and the state of the art in intermittent hardware and software, this paper describes several future PL research directions, emphasizing a connection between intermittence, distributed computing, energy-aware programming and compilation, and approximate computing. We illustrate these connections with a discussion of our ongoing work on programming for intermittence, and on building and simulating intermittent distributed systems

    Data Leakage via Access Patterns of Sparse Features in Deep Learning-based Recommendation Systems

    Full text link
    Online personalized recommendation services are generally hosted in the cloud where users query the cloud-based model to receive recommended input such as merchandise of interest or news feed. State-of-the-art recommendation models rely on sparse and dense features to represent users' profile information and the items they interact with. Although sparse features account for 99% of the total model size, there was not enough attention paid to the potential information leakage through sparse features. These sparse features are employed to track users' behavior, e.g., their click history, object interactions, etc., potentially carrying each user's private information. Sparse features are represented as learned embedding vectors that are stored in large tables, and personalized recommendation is performed by using a specific user's sparse feature to index through the tables. Even with recently-proposed methods that hides the computation happening in the cloud, an attacker in the cloud may be able to still track the access patterns to the embedding tables. This paper explores the private information that may be learned by tracking a recommendation model's sparse feature access patterns. We first characterize the types of attacks that can be carried out on sparse features in recommendation models in an untrusted cloud, followed by a demonstration of how each of these attacks leads to extracting users' private information or tracking users by their behavior over time

    GPU-based Private Information Retrieval for On-Device Machine Learning Inference

    Full text link
    On-device machine learning (ML) inference can enable the use of private user data on user devices without revealing them to remote servers. However, a pure on-device solution to private ML inference is impractical for many applications that rely on embedding tables that are too large to be stored on-device. In particular, recommendation models typically use multiple embedding tables each on the order of 1-10 GBs of data, making them impractical to store on-device. To overcome this barrier, we propose the use of private information retrieval (PIR) to efficiently and privately retrieve embeddings from servers without sharing any private information. As off-the-shelf PIR algorithms are usually too computationally intensive to directly use for latency-sensitive inference tasks, we 1) propose novel GPU-based acceleration of PIR, and 2) co-design PIR with the downstream ML application to obtain further speedup. Our GPU acceleration strategy improves system throughput by more than 20×20 \times over an optimized CPU PIR implementation, and our PIR-ML co-design provides an over 5×5 \times additional throughput improvement at fixed model quality. Together, for various on-device ML applications such as recommendation and language modeling, our system on a single V100 GPU can serve up to 100,000100,000 queries per second -- a >100×>100 \times throughput improvement over a CPU-based baseline -- while maintaining model accuracy

    Intermittent asynchronous peripheral operations

    No full text
    Energy harvesting enables battery-less sensing applications, but causes executions to become intermittent as a result of erratic energy provisioning. Intermittent executions pose challenges to peripheral consistency that threaten to leave peripheral-bound workloads in failed states or to impede forward progress of programs. Intermittent synchronous peripheral operations are supported in existing literature for specific kinds of peripherals. Asynchronous peripheral operations enable reactive concurrency in application implementations, which increases reactivity and improves energy consumption, but lack dedicated support in intermittent settings. We present Karma, the first general abstraction and system design to support both synchronous and asynchronous operations in an intermittent setting. Karma employs a novel combination of pe- ripheral roll-forward and computation roll-back to a rendezvous point guaranteeing consistency. It remains transparent to application programmers and peripheral driver, which favours portability. Our evaluation, based on three applications running on prototype hardware and using diverse energy sources, indicates that intermit- tent asynchronous peripheral support provided by Karma boosts data throughput by 83% compared to existing literature

    Intermittent Computing with Peripherals, Formally Verified

    Get PDF
    International audienceTransiently-powered systems featuring non-volatile memory as well as external peripherals enable the development of new low-power sensor applications. However, as programmers, we are ill-equipped to reason about systems where power failures are the norm rather than the exception. A first challenge consists in being able to capture all the volatile state of the application -- external peripherals included -- to ensure progress. A second, more fundamental, challenge consists in specifying how power failures may interact with peripheral operations. In this paper, we propose a formal specification of intermittent computing with peripherals, an axiomatic model of interrupt-based checkpointing as well as its proof of correctness, machine-checked in the Coq proof assistant. We also illustrate our model with several systems proposed in the literature

    Efficient lock-free durable sets

    No full text
    corecore